Overview

Dataset statistics

Number of variables14
Number of observations248258
Missing cells40673
Missing cells (%)1.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory26.5 MiB
Average record size in memory112.0 B

Variable types

NUM9
CAT3
DATE1
BOOL1

Warnings

VERSIE has constant value "248258" Constant
DATUM_BESTAND has constant value "248258" Constant
PEILDATUM has constant value "248258" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1766 distinct values High cardinality
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 40673 (16.4%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 21.19022168) Skewed

Reproduction

Analysis started2020-09-06 22:36:21.016456
Analysis finished2020-09-06 22:36:49.545942
Duration28.53 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

VERSIE
Boolean

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
1
248258 
ValueCountFrequency (%) 
1248258100.0%
 
2020-09-06T22:36:49.569939image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

DATUM_BESTAND
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
2020-07-21
248258 
ValueCountFrequency (%) 
2020-07-21248258100.0%
 
2020-09-06T22:36:49.665237image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-06T22:36:49.761584image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:49.862545image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

PEILDATUM
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
2020-07-01
248258 
ValueCountFrequency (%) 
2020-07-01248258100.0%
 
2020-09-06T22:36:50.005020image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-09-06T22:36:50.099333image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:50.189449image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

JAAR
Date

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
Minimum2012-01-01 00:00:00
Maximum2020-01-01 00:00:00
2020-09-06T22:36:50.310546image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:50.448371image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=9)

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean423.6569899
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:50.603653image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation929.0263119
Coefficient of variation (CV)2.192873797
Kurtosis69.91205688
Mean423.6569899
Median Absolute Deviation (MAD)8
Skewness8.473463086
Sum105176237
Variance863089.8881
MonotocityNot monotonic
2020-09-06T22:36:50.767280image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%) 
3053511914.1%
 
3133226813.0%
 
3032859211.5%
 
330199768.0%
 
316169036.8%
 
308124925.0%
 
324103034.2%
 
306102834.1%
 
301101304.1%
 
30481323.3%
 
Other values (17)6406025.8%
 
ValueCountFrequency (%) 
301101304.1%
 
30253842.2%
 
3032859211.5%
 
30481323.3%
 
3053511914.1%
 
ValueCountFrequency (%) 
841833011.3%
 
19001620.1%
 
3906160.2%
 
38927091.1%
 
36238261.5%
 

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct1766
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
101
 
1045
402
 
1025
403
 
994
301
 
993
201
 
940
Other values (1761)
243261 
ValueCountFrequency (%) 
10110450.4%
 
40210250.4%
 
4039940.4%
 
3019930.4%
 
2019400.4%
 
2039360.4%
 
4018430.3%
 
4048300.3%
 
4098120.3%
 
8028030.3%
 
Other values (1756)23903796.3%
 
2020-09-06T22:36:50.982779image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3 ?
Unique (%)< 0.1%
2020-09-06T22:36:51.169636image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length3
Mean length3.347960589
Min length2

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct5886
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean438593518.7
Minimum10501002
Maximum998418081
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:51.526001image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum10501002
5-th percentile28999036
Q199799023
median149599012
Q3990004004
95-th percentile990416042
Maximum998418081
Range987917079
Interquartile range (IQR)890204981

Descriptive statistics

Standard deviation428548956.9
Coefficient of variation (CV)0.9770982438
Kurtosis-1.726751426
Mean438593518.7
Median Absolute Deviation (MAD)119599999
Skewness0.478481953
Sum1.088843498e+14
Variance1.836542084e+17
MonotocityNot monotonic
2020-09-06T22:36:51.735028image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
99000400918440.7%
 
99000400717980.7%
 
99000300417580.7%
 
99000400614160.6%
 
99035607612460.5%
 
99035607311510.5%
 
99000300711420.5%
 
13199922811280.5%
 
13199916411060.4%
 
19929901310470.4%
 
Other values (5876)23462294.5%
 
ValueCountFrequency (%) 
105010026< 0.1%
 
105010039< 0.1%
 
105010049< 0.1%
 
105010059< 0.1%
 
105010073< 0.1%
 
ValueCountFrequency (%) 
998418081115< 0.1%
 
998418080100< 0.1%
 
99841807927< 0.1%
 
9984180776< 0.1%
 
9984180766< 0.1%
 

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8621
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean493.6843123
Minimum1
Maximum153137
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:51.949718image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median13
Q396
95-th percentile1648
Maximum153137
Range153136
Interquartile range (IQR)94

Descriptive statistics

Standard deviation3086.960776
Coefficient of variation (CV)6.252904334
Kurtosis390.858878
Mean493.6843123
Median Absolute Deviation (MAD)12
Skewness16.55051804
Sum122561080
Variance9529326.831
MonotocityNot monotonic
2020-09-06T22:36:52.146200image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14178716.8%
 
2205478.3%
 
3132145.3%
 
498304.0%
 
576133.1%
 
663962.6%
 
753572.2%
 
844341.8%
 
940621.6%
 
1036471.5%
 
Other values (8611)13137152.9%
 
ValueCountFrequency (%) 
14178716.8%
 
2205478.3%
 
3132145.3%
 
498304.0%
 
576133.1%
 
ValueCountFrequency (%) 
1531371< 0.1%
 
1529731< 0.1%
 
1447421< 0.1%
 
1331961< 0.1%
 
1122861< 0.1%
 

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct9207
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean575.5768475
Minimum1
Maximum239907
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:52.355565image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3105
95-th percentile1860
Maximum239907
Range239906
Interquartile range (IQR)102

Descriptive statistics

Standard deviation3891.722316
Coefficient of variation (CV)6.761429569
Kurtosis720.4087631
Mean575.5768475
Median Absolute Deviation (MAD)13
Skewness21.19022168
Sum142891557
Variance15145502.58
MonotocityNot monotonic
2020-09-06T22:36:52.560581image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14033116.2%
 
2202008.1%
 
3131115.3%
 
496613.9%
 
575563.0%
 
663952.6%
 
753282.1%
 
844261.8%
 
939861.6%
 
1036531.5%
 
Other values (9197)13361153.8%
 
ValueCountFrequency (%) 
14033116.2%
 
2202008.1%
 
3131115.3%
 
496613.9%
 
575563.0%
 
ValueCountFrequency (%) 
2399071< 0.1%
 
2325081< 0.1%
 
2310051< 0.1%
 
2277571< 0.1%
 
2196971< 0.1%
 

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7490
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7471.30391
Minimum1
Maximum210005
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:52.768313image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile35
Q1367
median1608
Q36094
95-th percentile36157
Maximum210005
Range210004
Interquartile range (IQR)5727

Descriptive statistics

Standard deviation17518.84739
Coefficient of variation (CV)2.344817934
Kurtosis32.78573724
Mean7471.30391
Median Absolute Deviation (MAD)1477
Skewness5.010108012
Sum1854810966
Variance306910014
MonotocityNot monotonic
2020-09-06T22:36:52.963126image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
214520.2%
 
94380.2%
 
44330.2%
 
204150.2%
 
174140.2%
 
144040.2%
 
24030.2%
 
273940.2%
 
83910.2%
 
193880.2%
 
Other values (7480)24412698.3%
 
ValueCountFrequency (%) 
13420.1%
 
24030.2%
 
33310.1%
 
44330.2%
 
53450.1%
 
ValueCountFrequency (%) 
21000525< 0.1%
 
20929519< 0.1%
 
20519217< 0.1%
 
20258817< 0.1%
 
20019016< 0.1%
 

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8257
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10488.71636
Minimum1
Maximum340654
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:53.168906image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile43
Q1471
median2184
Q38495
95-th percentile50492
Maximum340654
Range340653
Interquartile range (IQR)8024

Descriptive statistics

Standard deviation25354.07365
Coefficient of variation (CV)2.417271358
Kurtosis36.96084773
Mean10488.71636
Median Absolute Deviation (MAD)2021
Skewness5.281167133
Sum2603907747
Variance642829050.4
MonotocityNot monotonic
2020-09-06T22:36:53.365031image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
463700.1%
 
43630.1%
 
193480.1%
 
23410.1%
 
73400.1%
 
183390.1%
 
133370.1%
 
153360.1%
 
113310.1%
 
343290.1%
 
Other values (8247)24482498.6%
 
ValueCountFrequency (%) 
12980.1%
 
23410.1%
 
33150.1%
 
43630.1%
 
53130.1%
 
ValueCountFrequency (%) 
34065425< 0.1%
 
33848119< 0.1%
 
32377320< 0.1%
 
30076417< 0.1%
 
29401017< 0.1%
 

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct241
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean654642.3406
Minimum49
Maximum1489781
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:53.573306image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum49
5-th percentile37910
Q1246331
median744820
Q3995699
95-th percentile1340532
Maximum1489781
Range1489732
Interquartile range (IQR)749368

Descriptive statistics

Standard deviation427431.7618
Coefficient of variation (CV)0.6529241012
Kurtosis-1.147838894
Mean654642.3406
Median Absolute Deviation (MAD)321866
Skewness0.02596502876
Sum1.625201982e+11
Variance1.82697911e+11
MonotocityNot monotonic
2020-09-06T22:36:53.778520image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
88105751022.1%
 
87440143551.8%
 
84384743481.8%
 
88932443251.7%
 
86583542631.7%
 
73753640201.6%
 
108315138911.6%
 
106668638511.6%
 
106910038411.5%
 
104055538101.5%
 
Other values (231)20645283.2%
 
ValueCountFrequency (%) 
498< 0.1%
 
14040< 0.1%
 
29687< 0.1%
 
88954< 0.1%
 
1069100< 0.1%
 
ValueCountFrequency (%) 
148978129761.2%
 
145095530541.2%
 
142223435641.4%
 
134053235411.4%
 
133385635471.4%
 

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct242
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1032064.413
Minimum49
Maximum2558785
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:53.997438image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum49
5-th percentile41243
Q1351258
median1010488
Q31727801
95-th percentile2187018
Maximum2558785
Range2558736
Interquartile range (IQR)1376543

Descriptive statistics

Standard deviation727817.7627
Coefficient of variation (CV)0.705205754
Kurtosis-0.9645056149
Mean1032064.413
Median Absolute Deviation (MAD)670622
Skewness0.2829950506
Sum2.562182471e+11
Variance5.297186956e+11
MonotocityNot monotonic
2020-09-06T22:36:54.386021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
121180151022.1%
 
128153143551.8%
 
121595343481.8%
 
130553043251.7%
 
127379242631.7%
 
105321240201.6%
 
255340738911.6%
 
249630438511.6%
 
255878538411.5%
 
206819638101.5%
 
Other values (232)20645283.2%
 
ValueCountFrequency (%) 
498< 0.1%
 
14040< 0.1%
 
2969< 0.1%
 
30278< 0.1%
 
89554< 0.1%
 
ValueCountFrequency (%) 
255878538411.5%
 
255340738911.6%
 
249630438511.6%
 
218701837571.5%
 
206819638101.5%
 

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct3061
Distinct (%)1.5%
Missing40673
Missing (%)16.4%
Infinite0
Infinite (%)0.0%
Mean3465.36703
Minimum70
Maximum287220
Zeros0
Zeros (%)0.0%
Memory size1.9 MiB
2020-09-06T22:36:54.601319image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum70
5-th percentile140
Q1455
median1210
Q33935
95-th percentile13130
Maximum287220
Range287150
Interquartile range (IQR)3480

Descriptive statistics

Standard deviation6601.384753
Coefficient of variation (CV)1.904959762
Kurtosis177.7395121
Mean3465.36703
Median Absolute Deviation (MAD)985
Skewness8.099765665
Sum719358215
Variance43578280.66
MonotocityNot monotonic
2020-09-06T22:36:54.792958image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
16018040.7%
 
11016670.7%
 
10515850.6%
 
18014000.6%
 
30012280.5%
 
12012250.5%
 
14511980.5%
 
14011970.5%
 
50011290.5%
 
31011060.4%
 
Other values (3051)19404678.2%
 
(Missing)4067316.4%
 
ValueCountFrequency (%) 
702260.1%
 
7575< 0.1%
 
803600.1%
 
858520.3%
 
905410.2%
 
ValueCountFrequency (%) 
2872208< 0.1%
 
1489103< 0.1%
 
1428554< 0.1%
 
1221554< 0.1%
 
1167653< 0.1%
 

Interactions

2020-09-06T22:36:29.415803image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:29.666867image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:29.914639image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:30.156511image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:30.375072image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:30.588140image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:30.816079image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:31.046496image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:31.281009image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:31.512347image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:31.733452image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:31.938591image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:32.160755image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:32.367455image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:32.574862image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:32.787100image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:33.171328image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:33.374398image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:33.579591image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:33.813426image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:34.032799image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:34.267951image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:34.484877image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:34.700654image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:34.955848image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:35.182648image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:35.402709image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:35.624389image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:35.835747image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:36.036237image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:36.251482image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:36.448121image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:36.640342image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:36.847804image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:37.059135image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:37.265181image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:37.467163image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:37.669453image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:37.863766image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:38.069400image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:38.268173image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:38.454477image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:38.657228image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:38.873173image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:39.067018image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:39.435305image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:39.658700image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:39.870826image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:40.100550image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:40.313335image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:40.524690image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:40.752233image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:40.972945image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:41.187714image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:41.406987image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:41.643715image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:41.859044image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:42.087761image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:42.309692image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:42.522177image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:42.752784image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:42.978916image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:43.196090image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:43.415468image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:43.629678image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:43.836465image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:44.047824image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:44.250422image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:44.450361image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:44.663710image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:44.879273image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:45.086355image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:45.299540image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:45.678455image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:45.881082image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:46.105682image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:46.315329image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:46.515525image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:46.733579image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:46.951037image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:47.164106image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-09-06T22:36:54.979315image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-09-06T22:36:55.278184image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-09-06T22:36:55.583906image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-09-06T22:36:55.896162image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-09-06T22:36:47.688182image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:48.372887image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-09-06T22:36:49.203520image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02020-07-212020-07-012013-01-0132070997900118811211725471021797164225428685.0
11.02020-07-212020-07-012013-01-01320709998990517721172547102179716422541005.0
21.02020-07-212020-07-012013-01-013207099790012232221172547102179716422546910.0
31.02020-07-212020-07-012013-01-0132070999899056707221172547102179716422545380.0
41.02020-07-212020-07-012013-01-01320709998990264554652117254710217971642254210.0
51.02020-07-212020-07-012013-01-013207099790012201121172547102179716422544860.0
61.02020-07-212020-07-012013-01-0132070999899052332117254710217971642254NaN
71.02020-07-212020-07-012013-01-01320709998990128779182117254710217971642254480.0
81.02020-07-212020-07-012013-01-013207099989902827729121172547102179716422546525.0
91.02020-07-212020-07-012013-01-013207099790012191121172547102179716422546540.0

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
2482481.02020-07-212020-07-012018-01-013270216990027136112932556018622334082926050.0
2482491.02020-07-212020-07-012018-01-01327021699002714688902932556018622334082923880.0
2482501.02020-07-212020-07-012018-01-013270216990027147131329325560186223340829NaN
2482511.02020-07-212020-07-012018-01-013270216990027131849229325560186223340829165.0
2482521.02020-07-212020-07-012018-01-013270216990027135112932556018622334082940575.0
2482531.02020-07-212020-07-012018-01-013270216990027144101029325560186223340829NaN
2482541.02020-07-212020-07-012018-01-013270216990027150414429325560186223340829NaN
2482551.02020-07-212020-07-012018-01-01327021699002719932635829325560186223340829850.0
2482561.02020-07-212020-07-012018-01-013270216990027151477604293255601862233408293495.0
2482571.02020-07-212020-07-012018-01-0132702169900271982532426329325560186223340829220.0